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Abstract 

Feed-forward multilayer neural networks implementing random input-output map- 
pings develop characteristic correlations between the activity of their hidden nodes 
which are important for the understanding of the storage and generalization perfor- 
mance of the network. It is shown how these correlations can be calculated from the 
joint probability distribution of the aligning fields at the hidden units for arbitrary 
decoder function between hidden layer and output. Explicit results are given for the 
parity-, and-, and committee-machines with arbitrary number of hidden nodes near 
saturation. 



Multilayer neural networks (MLN) are powerful information processing devices. Be- 
cause of their computational abilities they are the workhorses in practical applications 
of neural networks and a lot of effort is devoted to a thorough understanding of their 
functional principles. At the same time their theoretical analysis within the framework 
of statistical mechanics is much harder than that for the single-layer perceptron. It was 
realized from the beginning that the properties of the internal representations defined as 
the activity patterns of the hidden units resulting from certain inputs are crucial for the 
understanding of the storage and generalization abilities of MLN ||l|, ||, ^, D . Qualita- 
tively the flexibility of MLN stems from the fact that the different subperceptrons between 
input and hidden layer can share the effort to produce the correct output. This division 
of labour gives rise to particular correlations between the activity of the hidden nodes. 
Near saturation these correlations become a characteristic feature of the decoder function 
between hidden units and output of the MLN under consideration and determine different 
aspects of its performance. 

Several ad-hoc approximations have been used to calculate these correlations, e.g., it 
was assumed that all internal representations giving the correct output (so called legal 
internal representations, LIR) are equiprobable js) or that only internal representations 
at the decision boundary of the decoder function occur ^ . In the present letter we show 
how these correlations between the hidden units can be calculated for a MLN of tree- 
architecture and give explicit results for the parity- (PAR), and- (AND) and committee- 
(COM) machine with arbitrary number K of hidden nodes near saturation. 
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A MLN of tree-architecture is given by N input nodes S^ik grouped into K sets of 
N/K nodes each, K hidden nodes Tk and one output a. The inputs ^j, = {Cifci* — 
1, . . . ,N/K} are coupled to the fc-th hidden unit by spherical couplings — {Jik G 
M, i — 1,... ,N/K, Jf. = N/K} according to = sgn(Jfc^j.). Each hidden node has 
therefore its own set of inputs (non-overlapping receptive fields). The hidden units Tk 
determine the output through a fixed Boolean function F{{Tk})- A set of input-output 
mappings {^^', cr''}, /i = 1, . . . ,p is generated at random where each bit is ±1 with equal 
probability. The couplings Jfc are then adjusted in such a way that the MLN gives the 
desired output a^^ for each input This is generically possible only if p/N = a < ac. 

We are interested in the correlations 

aN 

-" = ^E-r«---n^ (1) 

tx = l 

near saturation, i.e. for a — > Uc- From the statistical properties of the inputs it follows 
that 



Cn = {{TkiTk^ ■ ■ ■ TkJ) (2) 

where {{■■■)) denotes the average over the input-output pairs and ki, . . . ,kn is any set 
containing n different natural numbers between 1 and K. 

The c„ can be calculated from the joint probability distribution of internal represen- 
tations 



The calculation of P(ti,... , t^) parallels the determination of the local aligning field 
distribution for the perceptron Q (see also |[ ^). The general result within replica 
symmetry is 

P(n,..^,..)H(W.„/n«.Jf»L», (4) 

where 5n,m is the Kronecker symbol and the primed trace Tr^^ = Tr^j.(5o.,_F({,,j.}) is re- 
stricted to the legal internal representations. Moreover Dt = e^^^ 1"^ dtj \p2j: and ii{x) — 

Dt as usual. We do not display the saddle-point equation necessary to determine 
Q = ^/(l ~ q) SiS a function of a since we are mainly interested in the saturation limit 
a ac implying q —^ 1 and therefore Q — > oo. 

In this limit the integrand in (^ either tends to zero or to one depending on the values 
of the tk- When calculating P(ti,... ,Tk) explicitely for small K and special decoder 
functions one realizes a simple general rule. Consider the system before learning. All 
internal representations have an a-priori probability 2^^. Those already compatible with 
the desired output are not modified, all the others are shifted by the learning process to 
the nearest decision boundary of the decoder function. This is reminiscent of the aligning 
field distribution of the simple perceptron |Q, |l^ and has a natural interpretation within 
the cavity approach |ll|. On the basis of this general rule it is possible to determine 
P(ri, ... , Tfc) for arbitrary K and arbitrary decoder function. 

As examples we derive in the following explicit results for the PAR-, AND- and COM- 
machines defined by the decoder functions F{{Tk}) — Hfc '''k,F{{Tk}) — sgn(^j, Tk ~ K + 



2 



1/2) and F{{Tk}) — sgn{J2k 'He) respectively. For the PAR- and COM-machine we can set 
all outputs equal to +1 without loss of generality for symmetry reasons whereas for the 
AND-machine we have to stick to random outputs = ±1 with equal probability. 

In the case of the PAR-machine all internal representations are at the decision boundary 
of the decoder function. Hence all LIR gain in addition to their a-priori weight 2~^ an 
equal share from the 2^~^ internal representations that are eliminated by the learning 
process. Therefore for a — + etc all LIR have equal probability 2^^^ which results in c„ = 
for all rt = 1 , . . . , K — I and ck — ^■ 

In the case of the AND-machine there is only one LIR for the output ct = -1-1, namely 
Ti = . . . = ta' = 1. It contributes 1/2 to all c„. If cr = — 1 all but one internal repre- 
sentations are LIR. Only those with exactly one Tfc = — 1 are at the decision boundary of 
the decoder function and consequently only their probability is changed by the learning 
process. For symmetry reason it is clear that all of them get on equal share /K from 
the elimination of the internal representation ti = . . . = r/^ = -|-1 in addition to their 
a-priori weight 2~^ . The calculation of the resulting contribution from cr = —1 to the 
correlations c„ can be most easily accomplished by observing that c„ = for all n before 
learning. To calculate c„ after learning one has hence only to take into account those LIR 
with exactly one = — 1. The result is c„ = i — n2~^ /K. As expected all correlations 
are dominated by the restrictive case cr = +1 of the output. 

For the COM-machine the calculation is more involved. As usual we only consider odd 
values of K. The decision boundary is given by all LIR with J2k'''k = 1- these gain 
an equal share from the 2^^^ internal representation that have to be eliminated by the 
learning. Hence 

P({Tfe}) = 2-^ if ^rfc>l 



P{{rk}) = 2 



-K 



K 

K-l 
2 



if Y.^k^i 

k 



P{{Tk}) = else 

To determine the values of c„ from this P{{Tk}) it is convenient to consider the contribution 
from the regular part P^^\{Tk}) = 2"^ if '^i^Tk > 1 and that from the extra part 
p('=)({Tfc}) = lliJU)]-^ for Y.k'^k = 1 seperately. The regular part contribution to even 
moments is zero due to symmetry. Its contribution to odd moments is 



m=0 1=0 



since i = 0, . . . , n of the m = 0, . . . , ^ minus ones of a LIR can be found in ri, . . . , t„ 
whereas the remaining (jn — i) minus ones are to be distributed between the remaining 



{K ~ n) T„+i , ... , Tfc. After some algebra using properties of binomial coefficents 1 12 this 
can be simplified to 



i=0 \ - / \ 2 



^'^> {K-2)iK-A)...{K-n + l)[^ ' 



r(f)r(i-f)r(j^) 
^r(ii-f±i)[r(i^)P 
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Similarily one gets for the extra part contribution 

n -1 n 



K 

2 



E(-i)'(';)(^r" 



i=0 



which results in 



n!! 



1 

2 ' K{K ~ 2){K - 4) . . . (if - n + 1) 

r(| + i)r(i-f) 



for n odd and 



.(e) 



(e) 
-n-1 



if n is even. The final result for the correlations of the COM -machine is hence 

r(ii±i)r(-f) 



Cn{K) 



Cn{K) 



r(§)r(i 



2 > 



/^r( 



n-K+l \ 



T{K) 



2K [r(i^±i)]2 2K 



if n even 



if n odd 



(9) 

(10) 
(11) 

(12) 

(13) 
(14) 



Note that for n even one has c^-n+i = (— l)''^'''^''^^c„. As an example these results are 
shown in the figure for if = 25. 




-0.05 I ■ ' ■ ' ■ ' ■ ' ■ 1 
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n 

Moments of internal representations for a committee machine with K = 25 hidden 
units. The symbols are the results for integer n following from eqs.{^ and the 
full and dotted line are given by eqs.{13) and (14) respectively. The inset shows an 
enlarged region of the plot. 
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It is straightforward to obtain the asymptotic behaviour of the moments for K ^ oo. 
For the COM-machine moments c„ with either n or K — n small remain the largest ones 
in this limit. Explicitly one gets with the abbreviation C = l/ViirK ci « C, C2 = 
-1/{2K), C3 « -C/K,C4 = 3/{2K{K~2)),C5 « 3C/K^ and ck « (-l)^^-!)/^, ck-i = 
(_l)(if-i)/2/(2X),c;^_2 « (-l)(^+i)/V(2i^),CK-3 - (-l)(^+i'/23/(2i^(if-2)),CK-4 « 

(_l)(^f-l)/23/(2X2)^ 

So far we have considered a MLN with fixed decoder function and have determined the 
correlations c„ resulting near saturation. It is tempting to investigate also the comple- 
mentary question and to determine the storage capacity of an ensemble of K uncoupled 
perceptrons with prescribed correlations c^. For the COM-machine it is, e.g., known that 
already from the prescription of ci alone one gets the correct RS-asymptotics ac = for 
the storage capacity (which is, however, known to be unstable with respect to RSB). It 
is interesting to see whether the inclusion of other correlations can alter this asymptotics 

Finally it should be noted that the results obtained in this letter rely on the assumption 
of replica symmetry for the determination of the aligning field distribution whereas it is 
well known that replica symmetry breaking is crucial for the calculation of the storage 
capacity of MLN. On the other hand it is merely the qualitative behaviour of the aligning 
field distribution that is important for the determination of the c^. Since this is known 
to be hardly modified by RSB |l^ it seems likely that the results for the correlations 
will not be significantly altered by the inclusion of RSB. 

References 

[1] M. Mezard and S. Patarnello, On the Capacity of Feedforward Layered Networks^ 
LPTENS-preprint 1989, unpubhshed 

[2] M. Griniasty and T. Grossman, Phys. Rev. A45, 8924 (1992) 

[3] A. Priel, M. Blatt, T. Grossman, E. Domany, and I. Kanter, Phys. Rev. E50, 577 
(1994) 

[4] B. Schottky, J. Phys. A28, 4515 (1995) 

[5] R. Monasson and R. Zecchina, Phys. Rev. Lett. 75, 2432 (1995) 

[6] T. B. Kepler and L.F.Abbott, J. Phys. France 49, 1657 (1988) 

[7] E. Gardner, J. Phys. A22, 1969 (1989) 

[8] E. Barkai, D. Hansel, and H. Sompolinsky, Phys. Rev. A45, 4146 (1992) 

[9] M. Griniasty, Phys. Rev. E47, 4496 (1993) 

[10] M. Opper, Phys. Rev A38, 3824 (1988) 

[11] K. Y. M. Wong, Europhys. Lett. 30, 245 (1995) 

[12] D. E. Knuth, The Art of Computer Programming^ (Addison- Wesley, Menlo Park, 
1967), volume 1 

[13] D. Malzahn, A. Engel, and I. Kanter, in preparation 

[14] R. Erichsen, W. K. Theumann, J. Phys. A26, L61 (1993) 



5 



[15] P. Majer, A. Engel, and A. Zippelius J. Phys. A26, 7405 (1993) 



6 



